Determining relative motion as input

ABSTRACT

Input can be provided to a computing device based upon relative movement of a user or other object with respect to the device. In some embodiments, infrared radiation is used to determine measurable aspects of the eyes or other of a user. Since the human retina is a retro-reflector for certain wavelengths, using two different wavelengths or two measurement angles can allow user pupils to be quickly located and measured without requiring resource-intensive analysis of full color images captured using ambient light, which can be important for portable, low power, or relatively inexpensive computing devices. Various embodiments provide differing levels of precision and design that can be used with different devices.

CROSS-REFERENCES TO RELATED APPLICATIONS

This application is a continuation of allowed U.S. application Ser. No.12/786,297, entitled “DETERMINING RELATIVE MOTION AS INPUT,” filed May24, 2010; of which the full disclosure of this application isincorporated herein by reference for all purposes.

BACKGROUND

As the variety of available computing devices increases, and as the sizeof many of these devices decreases, there comes a need to adapt the waysin which users interface with these computing devices. For example,while typing on a keyboard is an easy and acceptable way for many usersto input information for a desktop computer, trying to enter informationon a keyboard of a portable phone can be difficult due to the small formfactor of the device. For example, the size of a user's fingers canprevent that user from easily pressing one key at a time. Further, asmany of these devices move to touch screens or other such input devices,the size of a user's finger can also inhibit the user from successfullyselecting an intended object or element on the screen, etc. Anotherdisadvantage to using such touch screens is that fingerprints, dirt,smudges, and other remnants are left on the display screen, which cancause glare or other issues with clarity and/or visibility. Some usersadd an extra layer of protective material to prevent damage to thescreen, but these devices can reduce touch sensitivity and amplify thenegative effects of the residue left on the screen.

Some portable devices utilize movement of the device as a type of input,wherein a user can tilt a device in a particular direction to provide aspecific input. The types of input that can be provided by suchmechanisms are limited, and require that the user be holding the devicein order to provide the input. Further, the device does not account forrelative motion. For example, if the user lies down while using thedevice the change in orientation might cause the device to registerinput even though the relative orientation of the device with respect tothe user is substantially unchanged.

BRIEF DESCRIPTION OF THE DRAWINGS

Various embodiments in accordance with the present disclosure will bedescribed with reference to the drawings, in which:

FIG. 1 illustrates an example device including components that can beused to provide input in accordance with various embodiments;

FIG. 2 illustrates an example component-level view of a device that canbe used in accordance with various embodiments;

FIG. 3 illustrates a configuration wherein a device with two imagingelements captures two images of a user in accordance with oneembodiment;

FIGS. 4(a) and (b) illustrate different head positions of a user inimages captured from offset cameras in accordance with one embodiment;

FIGS. 5(a)-(c) illustrate an example process for determining imageoffset that can be used in accordance with a first embodiment;

FIGS. 6(a)-6(b) illustrate analysis of facial features of a user inaccordance with various embodiments;

FIGS. 7(a)-7(c) illustrate an example of capturing eye movement of auser as input in accordance with one embodiment;

FIGS. 8(a)-8(c) illustrate an approach to determining retina locationfrom a pair of images that can be used in accordance with oneembodiment;

FIG. 9 illustrates an example process for determining relative positionof at least one aspect of a user of a computing device that can be usedin accordance with a first embodiment;

FIG. 10 illustrates an example imaging approach that can be used inaccordance with one embodiment;

FIG. 11 illustrates an example image that can captured using theapproach of FIG. 11;

FIGS. 12(a) and (b) illustrate an example imaging approach that can beused in accordance with one embodiment;

FIGS. 13(a)-(e) illustrate an example process for determining distanceto a user based on image offset that can be used in accordance with afirst embodiment;

FIG. 14 illustrates an example approach for determining distance to auser being imaged that can be used in accordance with one embodiment;

FIGS. 15(a) and (b) illustrate an example approach for determiningdistance based on optical focus that can be used in accordance with oneembodiment; and

FIG. 16 illustrates an example an environment in which variousembodiments can be implemented.

DETAILED DESCRIPTION

Systems and methods in accordance with various embodiments of thepresent disclosure may overcome one or more of the aforementioned andother deficiencies experienced in conventional approaches to providinginput to a computing device. In particular, approaches discussed hereinenable the device to determine and/or track the relative position,orientation, and/or motion of at least one aspect of a user, or otherobject, with respect to the device, which can be interpreted as input tothe computing device.

In one embodiment, at least one image capture element of a computingdevice is used to image at least a portion of a user. The image captureelement can utilize ambient light surrounding the device or user, or canrely upon light emitted from a display element or other component of theelectronic device. In other embodiments, at least one image captureelement is used that captures infrared (IR) or other radiation emittedfrom a component (e.g., an emitter such as an IR light emitting diode(LED) or laser diode) of the computing device, and reflected by theuser. In some embodiments, both an ambient light camera and one or moreinfrared detectors are used to determine aspects of relative positionand/or movement.

Certain approaches can utilize image recognition to track aspects of auser for use in providing input to the device. Examples of suchapproaches can be found in co-pending U.S. patent application Ser. No.12/332,049, filed Dec. 10, 2008, entitled “Movement Recognition as InputMechanism,” which is hereby incorporated herein by reference. Forcertain portable or low power devices, however, standard imagerecognition using ambient light and full color images may not beoptimal, as the analysis can require a significant amount of processingcapacity, resource usage, battery power, and other such aspects.Further, for device control purposes it can be desirable in at leastsome embodiments to monitor the user at a rate of 30 frames per secondor faster, which can be difficult (or at least particularly resource andpower intensive) when full color images must be analyzed. In some casesa significant amount of the processing can be pushed to a remoteprocessing system, but latency, bandwidth, and other such issues canprevent such an approach from working in all cases.

Accordingly, several embodiments described and suggested herein utilizeinfrared radiation, or other ranges of radiation that are outside therange of viewable light that is detectable by a human user. In additionto being imperceptible by a user, such that the user experience is notdegraded if the user is illuminated with such radiation, IR can providea relatively inexpensive tracking mechanism by taking advantage of theproperties of the human eyes to obtain at least one point source. Forexample, the human retina is a retro-reflector, such that light isreflected back at substantially the same angle in which the light wasincident on the retina. Thus, light from one angle will not be reflectedback from the retina along another (substantially different) angle.Further, the human eye absorbs certain wavelengths, such that light ofone wavelength may be reflected by the retina while light of anotherwavelength may be absorbed by the cornea and/or other portions of theeye, or otherwise not reflected back.

These properties enable two images to be captured that can be low-coloror grayscale in nature, as the portions of interest will either showreflection or show little to no reflection at the position of thepupils, for example. If one image is captured that includes thereflected light from the retinas, and another image is captured thatdoes not include the reflected light, the images can be compared toquickly determine the relative location and dimensions of the user'spupils (or other such features). Since other features of the user willgenerally reflect the same for each image, an image comparison canreadily reveal the relative position of the pupils without a significantamount of image processing.

In various embodiments, a running difference can be performed betweenimages including (and not including) the light reflected from theretinas. Subtracting the absolute values of the pairs of images willleave substantially two disc-shaped features corresponding to therelative positions of the user's pupils (as well as those of anyone elsein the view) such that changes in position or direction can quickly bedetermined and monitored over time. There can be features in thesubtracted image pairs that result from movement or other occurrences,but these features typically will not be disc shaped and can readily beremoved from consideration.

In some embodiments, a conventional digital camera or similar device canbe used to perform a rough head location for a user. Any of a number ofconventional image analysis approaches can be used to approximate thehead position of a user. This approximation can be used to furtherreduce the resources needed to process IR images, for example, as thedevice can know ahead of time the approximate location of the user'shead and can exclude areas substantially outside that area fromconsideration or analysis. In some embodiments that must account forimage offset due to the use of multiple cameras, a representativeportion can be selected from one IR image, such as may be based upondistinctive features or some other such aspect within the determinedhead region of the user, and an algorithm can attempt to match thatportion with a region of the other IR image that can be based, at leastin part, upon the head position of the user. The matching process thuscan use a sliding window and utilize a maximum match value, minimumdifference value, or other such value to determine the likely matchposition. An additional benefit of determining the image offset for thematch position, in addition to being able to align the images, is thatthe offset can indicate an approximate distance to the object (e.g.,user) being imaged. The distance can be useful in properly interpretingmovement, such as to determine gaze direction of a user.

Many other alternatives and variations are described and suggested belowin relation to at least some of the various embodiments.

FIG. 1 illustrates an example of an electronic computing device 100 thatcan be used in accordance with various embodiments. This example deviceincludes a display element 112 for displaying information to a user asknown in the art. The example device also includes at least oneorientation-determining element 108, such as an accelerometer or gyroelement, which can be used to determine orientation and/or motion of thedevice, which can help to interpret motion in a captured image usingvarious approaches described herein. The device also includes at leastone image capture element for capturing image information about the userof the device. The imaging element may include, for example, a camera, acharge-coupled device (CCD), a motion detection sensor, or a radiationsensor, among many other possibilities. The example device in FIG. 1includes an infrared (IR) emitter 102 and two IR detectors 104, 106(although a single detector and two emitters could be used as wellwithin the scope of the various embodiments). In other embodiments, asdiscussed herein, a device could instead include two ambient lightcameras in place of the two detectors 104, 106, and can utilize ambientlight and/or light from the display element 112. The IR emitter 102 canbe configured to emit IR radiation, and each detector can detect the IRradiation reflected from a user (or other such surface or object). Byoffsetting the detectors in this example, each detector will detectradiation reflected at different angles.

In the example illustrated in FIG. 1, a first IR detector 104 ispositioned substantially adjacent to the IR emitter 102 such that thefirst IR detector will be able to capture the infrared radiation that isreflected back from a surface, such as a viewer's retinas, in adirection that is substantially orthogonal to the capture plane of thedetector. The second IR detector 106 is positioned a distance away fromthe IR emitter 102 such that the detector will only detect IR radiationreflected at an angle with respect to the orthogonal direction. Whenimaging a retro-reflector such as a user's retina, the second IR emitterwill detect little to no reflected radiation due to the IR emitter, asthe retina will not significantly reflect in the direction of the secondemitter (although defects, particulates, or variations may deflect someof the radiation). As discussed later herein, this difference amongimages can be used to determine the position (and other aspects) of theretinas of a user, as the difference in IR reflection between the twoimages will be significant near the pupils or other such features, butthe remainder of the images will be substantially similar.

In an alternative embodiment, a computing device utilizes a pair of IRemitters (e.g., IR light emitting diodes (LEDs), IR laser diodes, orother such components), to illuminate a user's face in a way that is notdistracting (or even detectable) to the user, with the reflected lightbeing captured by a single IR sensor. The LEDs are separated asufficient distance such that the sensor will detect reflected radiationfrom a pupil when that radiation is emitted from the LED near thesensor, and will not detect reflected radiation from the pupil when thatradiation is emitted from the LED positioned away from the sensor. Thesensor can capture IR images that enable the device to analyze featuresof the user that reflect IR light, such as the pupils or teeth of auser. An algorithm can attempt to calculate a position inthree-dimensional space (x, y, z) that corresponds to a locationequidistant between the user's eyes, for example, and can use thisposition to track user movement and/or determine head motions. A similarapproach can be used that utilizes a single IR emitting diode and a pairof IR sensors, as discussed above. Thus, the device can either direct IRfrom two locations or detect IR from two locations, with only one ofthose locations receiving retro-reflected radiation from a user'sretinas. Other embodiments can utilize other approaches for performinghead tracking, such as by requiring a user to wear glasses that emit IRradiation from a point source, etc.

In some embodiments it can be preferable to utilize a single emitter andtwo cameras when using single wavelength IR (e.g., 940 nm) in twodirections, as using a single camera might be cheaper but also requiresthat images from the different directions be captured at differenttimes. A downside to capturing images at different times is thatmovement during that period can affect the determination, even forcapture frequencies on the order of 30 Hz (or 15 Hz for two cameras toget the same resolution). An advantage to a multi-camera system is thatthe images can be captured substantially simultaneously, such thatmovement between images is minimized. A potential downside to such anapproach, however, is that there can be optical variations in the imagesdue to the images being captured from two different points of view.

In one embodiment, a single detector can be used to detect radiationreflected at two different wavelengths. For example, a first LED couldemit radiation at a wavelength (e.g., 940 nm) that is reflected by theretina, and a second LED could emit radiation at a wavelength (e.g.,1100 nm) that is absorbed by the cornea and/or other portions of thehuman eye. Specific wavelengths can be selected within selectedwavelength ranges, based at least in part upon their reflectiveproperties with respect to the human eye. For example, experimentsindicate that light has less than a 50% absorption rate (for the typicalhuman eye) under about 940 nm, above 50% absorption between about 940 nmand about 1030 nm, around 50% absorption for wavelengths between about1040 nm and about 1100 nm, and about 100% absorption at 1150 nm andabove. Thus, emitters can be selected that fall within at least some ofthese ranges, such as a first IR emitter that has significantly lessthat 50% absorption and a second IR emitter that has significantlygreater than 50% absorption. The specific wavelengths can further bebased, in at least some embodiments, upon the wavelengths of availabledevices. For example, an available laser diode at 904 nm can be selectedthat has a relatively low absorption rate, and an available laser diodeat 980 nm or 1064 nm can be selected that has a relatively highabsorption rate. In some embodiments, the power output of the higherwavelength diode can be scaled up to substantially match the perceivedbrightness of the lower wavelength diode by a CMOS sensor (or other suchdetector), the sensitivity of which might fall off to around zero at avalue of about 1100 nm, such that in at least one embodiment the twoemitters have wavelengths of 910 nm and 970 nm).

An advantage to using two wavelengths is that the LEDs can emit theradiation simultaneously, as long as a resulting image is able to bedecomposed in order to extract image information corresponding to eachwavelength. Various approaches for decomposing such an image arediscussed elsewhere herein. The LEDs then could both be positioned nearthe camera, or a single LED or emitter can be used near the camera ifthat LED operates at (at least) the two frequencies of interest.

The emitter(s) and detector(s), and any ambient light camera(s) or otherimage capture element(s), can be positioned on the device in locationsthat are least likely to interfere with the user's operation of thedevice. For example, if it is determined that average users hold thedevice by the middle of either side of the device and primarily on theright side or on the bottom of the device, then the emitter anddetectors can be positioned at the corners of the device, primarily onthe left-hand side or top of the device. In another embodiment, theremay be additional IR emitters (not shown) positioned on the device thattransmit IR at different frequencies. By detecting which frequencies arereceived by the detectors, the device can determine specific informationas to the orientation of the users gaze.

In some embodiments, it might be useful for a user to participate in acalibration process which accounts for aspects such as the strength ofeye reflection from the user, as well as to determine dimensions,calibrate gaze direction determinations, etc. Such an approach also canbe useful if a user uses glasses that reduce the reflective capability,etc.

As discussed, using multiple input mechanisms can help to interpretinformation captured about each viewer, such as the movement of aviewer's pupils or other features. For example, the device can include atouch-sensitive element 110 around at least a portion of the device 100.A material similar to that used with a touch-sensitive display elementcan be used on the back and/or sides of the device. Using such material,the device is able to determine whether a user is actively holding thedevice. Such information could be used to perform a first input fordetected motion if the user is holding the device, and a second input ifthe user is not holding the device. In addition to determining whetherthe user is holding the device, the system can determine, through use ofthe touch-sensitive element, which portions of the device are covered bythe user. In such an embodiment, multiple IR emitters may be positionedon the device at different locations, and based on where the user isholding the device (i.e., which IR emitters are covered vs. notcovered), the system can determine which IR emitters to use whencapturing images.

The example device in FIG. 1 also includes a light-detecting element 116that is able to determine whether the device is exposed to ambient lightor is in relative or complete darkness. Such an element can bebeneficial in a number of ways. In certain conventional devices, alight-detecting element is used to determine when a user is holding acell phone up to the user's face (causing the light-detecting element tobe substantially shielded from the ambient light), which can trigger anaction such as the display element of the phone to temporarily shut off(since the user cannot see the display element while holding the deviceto the user's ear) and privacy detection to be temporarily disabled. Thelight-detecting element could be used in conjunction with informationfrom other elements to adjust the functionality of the device.

Further, a light-detecting sensor can help the device compensate forlarge adjustments in light or brightness, which can cause a user'spupils to dilate, etc. For example, when a user is operating a device ina dark room and someone turns on the light, the diameters of the user'spupils will change. As with the example above, if the device includes adisplay element that can operate in different modes, the device may alsoswitch modes based on changes in the user's pupil dilation. In order forthe device to not improperly interpret a change in separation betweenthe device and user, the light detecting sensor might cause gazetracking to be temporarily disabled until the user's eyes settle and arecalibration process is executed. Various other such approaches tocompensate for light variations can be used as well within the scope ofthe various embodiments.

The example device 100 in FIG. 1 is shown to also include a microphone114 or other such audio-capturing device. The device in at least someembodiments can also determine various actions based upon sound detectedby the microphone. For example, if the device is in a pocket or bag, forexample, the microphone might be significantly covered by a material,which can affect the quality of sound recorded. The device then can lockout certain functionality, such as to at least temporarily disable imagetracking.

In the example configuration of FIG. 1, each imaging element 104, 106 ison the same general side of the computing device as a display element,such that when a user is viewing the interface in the display elementthe imaging element has a viewable area that, according to this example,includes the face of the user. While in some embodiments the imagingelement is fixed relative to the device, in other embodiments theimaging element can be operable to track the position of the user, suchas by rotating the imaging element or an optical element (e.g., a lens,mirror, etc.) that directs light to the imaging element. Althoughembodiments described herein use examples of the viewable area includingthe face of the user, the viewable area may include other portions ofthe body such as arms, legs, and hips, among other possibilities. In anycase, the viewable area of an imaging element can be configured toobtain image information corresponding to at least a portion of a useroperating the device, and if the imaging element is continually (or atleast substantially continually) capturing or otherwise obtaining imageinformation, then any movement of the user relative to the device(through movement of the user, the device, or a combination thereof) cancause a position or orientation of at least one aspect of that userwithin the viewable area to change.

FIG. 2 illustrates a set of basic components of an example computingdevice 200 such as the devices described with respect to FIG. 1. While aportable smart device is depicted in many examples herein, the computingdevice could be any appropriate device able to receive and process inputcommands, such as a personal computer, laptop computer, television settop box, cellular phone, PDA, electronic book reading device, video gamesystem, or portable media player, among others. In this example, thedevice includes a processor 202 for executing instructions that can bestored in a memory device or element 204. As known in the art, thedevice can include many types of memory, data storage orcomputer-readable media, such as a first data storage for programinstructions for execution by the processor 202, a separate storage forimages or data, a removable memory for sharing information with otherdevices, etc. The device typically will include some type of displayelement 206, such as a liquid crystal display (LCD), although devicessuch as portable media players might convey information via other means,such as through audio speakers. As discussed, the device in manyembodiments will include at least one imaging element 208 such as acamera, sensor, or detector that is able to image a facial region of auser. The imaging element can include any appropriate technology, suchas a CCD imaging element having a sufficient resolution, focal range andviewable area to capture an image of the user when the user is operatingthe device. Methods for capturing images using an imaging element with acomputing device are well known in the art and will not be discussedherein in detail. It should be understood that image capture can beperformed using a single image, multiple images, periodic imaging,continuous image capturing, image streaming, etc. Further, a device caninclude the ability to start and/or stop image capture, such as whenreceiving a command from a user, application or other device.

In some embodiments, the device can have sufficient processingcapability, and the imaging element and associated analyticalalgorithm(s) may be sensitive enough to distinguish between the motionof the device, motion of a user's head, motion of the user's eyes andother such motions, based on the captured images alone. In otherembodiments, such as where it may be desirable for the process toutilize a fairly simple imaging element and analysis approach, it can bedesirable to include at least one orientation determining element 210that is able to determine a current orientation of the device 200. Inone example, the at least one orientation determining element is atleast one single- or multi-axis accelerometer that is able to detectfactors such as three-dimensional position of the device and themagnitude and direction of movement of the device, as well as vibration,shock, etc. Methods for using elements such as accelerometers todetermine orientation or movement of a device are also known in the artand will not be discussed herein in detail. Other elements for detectingorientation and/or movement can be used as well within the scope ofvarious embodiments for use as the orientation determining element. Whenthe input from an accelerometer or similar element is used along withthe input from the camera, the relative movement can be more accuratelyinterpreted, allowing for a more precise input and/or a less compleximage analysis algorithm.

In some embodiments, the device can include at least one additionalinput device 212 able to receive conventional input from a user. Thisconventional input can include, for example, a push button, touch pad,touch-sensitive element used with a display, wheel, joystick, keyboard,mouse, keypad or any other such device or element whereby a user caninput a command to the device. Some devices also can include amicrophone or other audio capture element that accepts voice or otheraudio commands. For example, a device might not include any buttons atall, but might be controlled only through a combination of visual andaudio commands, such that a user can control the device without havingto be in contact with the device. As will be discussed later herein,functionality of these additional input devices can also be adjusted orcontrolled based at least in part upon the determined gaze direction ofa user or other such information.

When using a computing device with multiple capture elements separatedsome distance on the device, there will be some lateral offset ofobjects contained in images captured by those elements. For example,FIG. 3 illustrates an example configuration 300 wherein a computingdevice 302 has a first image capture element 304 positioned near a topedge of the device and a second image capture element 306 positionednear a bottom edge of the device. Unless the device has adjustablecapture elements and image alignment software, the image captureelements likely will point in a direction substantially orthogonal tothe plane of the device face corresponding to each element. Thus, thefield of view of each camera will be different, with a lateral offsetcorresponding substantially to the lateral distance between the imagecapture elements and the distance to the object being imaged. In FIG. 3,the image capture elements are capturing image information correspondingto a user 312 of the device. The field of view 308 corresponding to thefirst image capture element 304 at a distance corresponding to aposition of the user 312 will be slightly above the field of view 310 ofthe second image capture element 306.

FIGS. 4(a) and 4(b) illustrate example images 400 captured by the firstand second image capture elements, respectively, in the example of FIG.3. In FIG. 4(a) the image corresponds substantially to the field of view308 of the first image capture element 304. Image analysis software canuse any of a number of algorithms known in the art or subsequentlydeveloped to locate an approximate position of a person's head in theimage. In this example, the software generates a virtual box 402surrounding the position of the user's head in the image. The distancefrom the top of the virtual box to the top of the image is a firstdistance d.

In FIG. 4(b) the image corresponds substantially to the field of view310 of the second image capture element 306. In this example, thesoftware generates another virtual box 404 surrounding the position ofthe user's head in the second image. The distance from the top of thevirtual box to the top of the image captured using the second imagecapture element is a second distance d′. The offset of the images caneffectively be determined by determining the difference in thesedistances, using a formula such as:image offset=d−d′.If features in the two images are to be aligned, such as to compare thereflections of common features in the two images, then at least one ofthe images must be adjusted by the amount of image offset.

For certain inputs that do not require precise location or orientationdetermination, such a basic image offset determination can be adequate.In many cases, however, the existing algorithms for locating anapproximate location of a user's head are not sufficiently accurate tobe used in tracking features such as the relative position andseparation of a user's pupils or other such aspects.

FIGS. 5(a)-5(c) illustrate another example approach 500 that can be usedto align images in accordance with various embodiments. In this examplethe images are shown to be ambient light (or similar) images, but itshould be understood that similar approaches could be used with infraredimages or other appropriate image files. In FIG. 5(a), a first image 502is shown that was captured using a first image capture element of adevice. An image alignment or similar algorithm can select anappropriate subset 504 of the image, such as a set of pixels in theimage that meet a certain criteria, such as a specific position or levelof distinctiveness. For example, an algorithm can be configured toselect a region that is near the center of the image, such that thematching region will likely be contained within the image captured fromthe other capture element. Further, the region can be selected based atleast in part upon some uniqueness or distinctiveness criterion. Forexample, selecting a region of a user's forehead might not be optimal asthere might be many regions or locations on the user's face thatessentially contain only skin of the user, without significant shadows,features, or blemishes. On the other hand, selecting a region thatcontains at least a portion of each of the user's eyes, as illustratedin the example of FIG. 5(a), can provide a location that will likelyonly match one location in the other image. Various algorithms exist fordetermining distinctive features from an image that can be used as wellwithin the scope of the various embodiments. In embodiments where eyelocation and/or gaze direction of a user is determined as input,however, determining and matching the location of the eyes of a user canhave additional benefits, as matching the area around the eyes willreduce the effects of any optical artifacts introduced into the imagesdue to the optics, etc., such as geometric distortion.

FIG. 5(b) illustrates the selected portion 504 of the first image 502that can be used to attempt to find a matching location in the secondimage 506, and thus determine the relative image offset. A matchingalgorithm can start at an appropriate location in the second image 506,such as may correspond to the coordinates where the portion was locatedin the first image 502. The initial location will likely not match(unless the user was very far away), such that the algorithm will needto adjust the location to determine a match. Since the algorithm canhave access to information indicating the relative separation (includingdirection) of the image capture elements that captured the images, thealgorithm can determine the appropriate direction (e.g., down in theexample of FIG. 5(c)) to move to attempt to determine a match.

In FIG. 5(c), the selected image portion 504 is first compared with afirst position in the second image 506. Although the example shows theselected image portion as being placed over the second image orotherwise positioned with respect to the image, it should be understoodthat any appropriate technique for comparing image portions can be used,which can involve comparing color or brightness values at correspondingpixel locations, for example, and do not actually perform a visualcomparison. In some embodiments, at least a portion of the pixels (orother points or subsets) of the selected image portion 504 are comparedwith the corresponding portions of the second image, and if the valuesat those portions at least meet a minimum match threshold, the positioncan be determined to be the appropriate match position. If the locationdoes not at least meet a minimum match threshold, the position of theselected image portion with respect to the second image can be shiftedin the determined direction, and another comparison can be performed.The position can be moved, and matching process repeated, until alocation is found that at least meets a minimum matching threshold (orother such criterion), or until the edge of the image (or a maximumrange of comparison) is reached. In cases where the user is too close tothe device, or where the user moves between image captures, there mightnot be a sufficient match able to be determined.

In another embodiment, the intensity difference at each pixel locationwith respect to the selected portion can be determined (e.g., onesubtracted from the other). The average difference, or some othermeasure of the difference, then can be used to determine an overalldifference measurement for each location. Using such an approach, theminimum value would instead be used, as the match location would exhibitthe lowest average difference between intensity values.

In some approaches, the matching process will compare the images over aminimum range or number of positions, and will determine at least onematch score at each location. The distance between locations can befixed in some embodiments, while in other embodiments the distancebetween locations (and the number of locations) can be determined atleast in part based upon aspects of the one or more images. For example,in some embodiments a relative head size can be determined with respectto the image, such that when the head occupies more of the image thedistance between comparison locations can be larger, while images wherethe head occupies less of the image might require smaller distancesbetween capture locations in order to find an appropriate matchlocation.

Further, since the matching is performed at a discrete set of locations,it is likely that the actual match point will fall at some point betweentwo of the discrete locations. A curve-fitting or similar function canbe applied to the match values to attempt to interpolate the precisematch position based upon a maximum value position of the curve-fittingfunction. In some embodiments, the position will be moved until amaximum match point is reached and a minimum number or range ofsubsequent values have a lower match score, such that the match positionlikely has already been determined. In other embodiments, the entirerange of match positions can be analyzed in order to prevent theinadvertent acceptance of a secondary maximum value in the fit curve.

If an appropriate match location is determined, the offset distancecorresponding to the differences in the match location in the two (ormore) images can be used to properly align the images (at leastmathematically) in order to ensure that the appropriate portions arebeing analyzed in each image. Such an approach can be particularlyimportant for approaches such as IR retinal reflection, where thedetermination of retinal position, dimensions, and/or other such aspectsrelies upon differences between the images at corresponding locations.

Once the images are aligned, one or more algorithms can analyze theimages to attempt to determine information about the images, such as thelocation of specific features in each image. As discussed above, certainembodiments utilize information about the user's eyes to attempt todetermine information such as relative movement between the computingdevice and the user, as well as changes in gaze direction of the user.As discussed, a imaging element of a computing device can capture animage of at least a portion of a user of the device when the user is infront of the device (or at least within the viewing angle of an imagingelement of the device), such as would normally occur when the user isviewing the display element of the device.

If the device includes software and/or hardware that is able to locateat least one feature of the user that can be consistently determined,such as the eyes, nose, or mouth of the user, then the device cananalyze the image information to determine relative motion over a periodof time and utilize that relative motion as input. For example, a usercan tilt the device or rotate the user's head, such as to nod up anddown, in a “yes” motion. Such motion can be detected and analyzed by theimaging element (e.g., camera) as the position of the user's eyes in theviewable area will move in the images. Further, aspects such as theimaged shape, size, and separation of the user's eyes also can change.Movement of the eyes in the viewable area could also be accomplished bymoving the device up and down while the user remains still, as well asthrough other such motions. In some embodiments, the device is able todistinguish between movement of the user and movement of the device,such as by detecting movement of a background or other aspect of theimages, or by analyzing the separation, shape, or size of variousfeatures. Thus, in embodiments described anywhere in this descriptionthat use an imaging element to determine an orientation or location ofthe device relative to its user, a user can have an option of inputtinga given type of motion, corresponding to a specific command, by movingthe device or altering an aspect of the user, or both.

As described above, when using the imaging element of the computingdevice to detect motion of the device and/or user, the computing devicecan use the background in the images to determine movement. For example,if a user holds the device at a fixed orientation (e.g., distance,angle, etc.) to the user and the user changes orientation to thesurrounding environment, analyzing an image of the user alone will notresult in detecting a change in an orientation of the device. Rather, insome embodiments, the computing device can still detect movement of thedevice by recognizing the changes in the background imagery behind theuser. So, for example, if an object (e.g., a window, picture, tree,bush, building, car, etc.) moves to the left or right in the image, thedevice can determine that the device has changed orientation even thoughthe orientation of the device with respect to the user has not changed.

In some cases, relative movement could be open to multipleinterpretations. For example, in one application a device might beprogrammed to perform a first action if the device is moved up and/ordown, and a second action if the device is instead tilted forward orbackward. As should be apparent, each action can correspond to theposition of the user's eyes moving up and/or down in the viewable area.In some embodiments, as will be discussed below, the camera anddetection may be sensitive enough to distinguish between the two motionswith respect to how the user's face changes in the captured images, suchas the shape and separation of various features or other such aspects.In other embodiments, where it may be desirable for the process toutilize a fairly simple imaging element and analysis approach, it can bedesirable to include at least one orientation determining element (e.g.,an accelerometer or gyro) in the device that is able to determine acurrent orientation of the device. In one example, the at least oneorientation determining element includes at least one single- ormulti-axis accelerometer is used that is able to detect factors such asthree-dimensional position of the device, the magnitude and direction ofmovement of the device, as well as vibration, shock, etc. Other elementsfor detecting orientation and/or movement can be used as well within thescope of various embodiments for use as orientation determining element.When the input from an accelerometer is used with the input from thecamera, the relative movement can be more accurately interpreted,allowing for a wider range of input commands and/or a less complex imageanalysis algorithm. For example, use of an accelerometer can not onlyallow for distinguishing between lateral and rotational movement withrespect to the user, but also can allow for a user to choose to provideinput with or without the imaging element. Some devices can allow a userto specify whether input is to be accepted from the imaging element, theorientation determining element, or a combination thereof.

The computing device can store, or otherwise have access to, at leastone algorithm to analyze the captured images, as may be stored at leasttemporarily on the device itself, or can send the images to be analyzedby a remote computer or service, etc. Any of a number of algorithms canbe used to analyze images, detect features, and track variations in thepositions of those detected features in subsequent images. For example,FIG. 6(a) illustrates an image of a face 600 of a user of a device ascould be captured (e.g., obtained or imaged) by an imaging element ofthe device. Thus, the face 600 is depicted as perceived by the imagingelement of the device. As can be seen in FIG. 6(a), and also in theeye-specific view of FIG. 6(b), there are various aspects of the user'sface that can be located and measured, such as the perceived width andheight of a user's eyes, the perceived relative separation of a user'seyes and the perceived relative position of the user's eyes to an edgeof the user's face when facing the device. Any number of other suchmeasurements or aspects can be used as should be apparent. When a usertilts or translates the device, or moves his or her head in anydirection, there will be a corresponding change in at least one of thesemeasured aspects in subsequent images that are obtained. For example, ifthe user tilts his or her head right or left, the horizontal distance fin FIG. 6(a) between the user's eyes and an edge of a side of the user'sface will change. In a similar manner, if the user tilts his or her headup or down, the vertical distance g between the user's eyes and an edgeof the top of their head will change. Further, the shape or horizontalmeasurements a and b and the shape or vertical measurements e and h ofthe user's eyes will change and can change by different amounts. Theseparation distance c between the eyes can change as well. Using suchinformation, the device can determine a type of motion that occurred andcan use this information to help interpret the movement of the user'spupils or other such information.

For example, FIGS. 7(a)-7(c) illustrate the movement of a user's pupilswith respect to the user's eye position. In some embodiments, the user'spupil position relative to the user's eye position can be at leastpartially indicative of the gaze direction of the user. For example,assuming the user is facing toward the device, in FIG. 7(a) the user isgazing forward, while in FIG. 7(b) the user is gazing downward and inFIG. 7(c) the user is gazing to the left (in the figure). Suchinformation by itself, however, may not be sufficient to determine gazedirection. For example, if the user had tilted his or her head up (orback) while making the pupil movement in FIG. 7(b), the user mightactually be looking forward (or even ‘up’ relative to the previousposition). Further, if the user translates his or her head to the leftor right in FIG. 7(a), but does not adjust the position of the pupilswith respect to the user's eyes, then the viewing location wouldactually change even though the user is still looking straight ahead.Thus, in certain embodiments, it can be advantageous to utilize facialmeasurement approaches to interpret the pupil movements of FIGS.7(a)-7(c).

When using an imaging element of the computing device to detect motionof the device and/or user, for example, the computing device can use thebackground in the images to determine movement. For example, if a userholds the device at a fixed orientation (e.g. distance, angle, etc.) tothe user and the user changes orientation to the surroundingenvironment, analyzing an image of the user alone will not result indetecting a change in an orientation of the device. Rather, in someembodiments, the computing device can still detect movement of thedevice by recognizing the changes in the background imagery behind theuser. So, for example, if an object (e.g. a window, picture, tree, bush,building, car) moves to the left or right in the image, the device candetermine that the device has changed orientation, even though theorientation of the device with respect to the user has not changed. Inother embodiments, the device may detect that the user has moved withrespect to the device and adjust accordingly. For example, if the usertilts their head to the left or right with respect to the device, thecontent rendered on the display element may likewise tilt to keep thecontent in orientation with the user.

In some embodiments, the accuracy of the image capture and detection canbe such that gaze direction and/or field of view can be determined basedsubstantially on pupil-related information. In one embodiment, imageanalysis can be performed to locate the position of the user's pupils.The dimensions of the pupils themselves, as well as position andseparation, can be indicative of changes in the user's gazing direction.For example, in addition to determining that pupils move from left toright in adjacently-captured images, the device can determine, due tosmall changes in the width of each pupil, whether the user position withrespect to the device has translated. Similarly, the device candetermine whether the user rotated his or her eyes, which would resultin changes in diameter since the eyes are spherical and changes inrotation will result in changes in the captured dimensions. By beingable to precisely measure pupil-related dimensions, the device can trackthe field of view of the user with respect to the device.

Another benefit to being able to accurately measure pupil-relateddimensions is that the device can also determine a focus depth of theuser. For example, if the user focuses on a point “farther away” fromthe user, the device can detect a change in separation of the pupils.Because the device can also measure the dimensions of the pupils in theimage, the device can also determine that the increase was not due to anaction such as a decrease in the distance between the user and thedevice. Such information can be useful for three-dimensional images, forexample, as the device can determine not only a viewing location, butalso a depth at which the user is focusing in order to determine wherethe user is looking in three-dimensional space.

While user information such as pupil measurements can be determinedthrough various image analysis approaches discussed above, conventionalimage analysis algorithms are relatively processor-intensive and canrequire a significant amount of memory. Conventional portable devices,such as cellular phones and portable media players, might not have thenecessary resources to perform such real-time image analysis,particularly at the resolution needed to detect small variations inpupil diameter. Further, in order for the image capture to work theremust be a sufficient amount of ambient light, such that if a user isreading an electronic book on a device with a display such as anelectronic paper display that does not generate significant illuminationas would an LCD or similar display element, there might not be enoughlight to adequately capture the necessary image information.

FIGS. 8(a)-8(c) illustrate an example process for determining pupil orretina parameters using infrared radiation that can be used inaccordance with various embodiments. In this example, a first image isshown in FIG. 8(a) that was captured using a sensor positioned near aninfrared source, such that each retina substantially reflects theinfrared radiation back towards the sensor. FIG. 8(b) illustratesanother image captured using a sensor positioned away from an infraredsource, such that any IR radiation reflected by the retinas is notdirected towards, or detected by, the sensor. Thus, as can be seen, themajor significant difference between the two images is the reflection bythe retinas. Using simple image comparison or subtraction algorithms,for example, the retinas can quickly be extracted from the images oncethe images are aligned using a process such as those discussed above. Ifnoise is sufficiently filtered out, using any appropriate method knownin the art, the resultant image in FIG. 8(c) will include substantiallyonly the reflection from the retinas, which can quickly be analyzed withvery little resource allocation.

As with the analysis of conventional full-color images described above,however, the resolution of the IR-based approach described above mightnot be sufficient to track gaze direction or field of view for allapplications. In such cases, it can be beneficial to utilize additionalinput mechanisms and/or additional IR emitters and detectors to helpinterpret or enhance the captured information. At least some of theseadditional elements shall be referred to herein as“environment-determining input elements,” as the additional elements areoperable to determine at least one aspect relating to the environmentsurrounding the device, such as light or noise surrounding the device, arelative orientation of the device to the surroundings, whether a useris holding the device, etc. While use of IR emitters and detectors aredescribed herein, any type of facial or movement recognition techniquemay be used with the embodiments described herein.

FIG. 9 illustrates an example process 900 for providing input to acomputing device based at least in part on imaged aspects of a user inaccordance with one embodiment. As should be understood, the describedprocess is merely an example, and that there can be additional, fewer,or alternative steps performed in similar or alternative orders, or inparallel, in accordance with the various embodiments. In this example,facial tracking (or other relative user position-based tracking asdiscussed herein) is activated 902, either manually or automaticallyupon startup or in response to another appropriate action or occurrence,such as opening a particular application on the device. At anappropriate time (such as at regular intervals) a first image iscaptured from a first location 904, such as by a first image captureelement at a first position on the computing device. At anotherappropriate time (such as concurrent with the first image capture or ata time shortly thereafter), a second image is captured from a secondlocation 906, such as by a second image capture element at a secondposition on the computing device. As discussed, if infrared radiation isbeing used to determine retinal reflection, the image capture elements(e.g., IR sensors) should be positioned such that one image captureelement will receive IR reflected from a user's retina and the otherimage capture element will not receive reflected IR. It should beunderstood, however, that any separation will only work for a range ofdistance between the user and the device, past which both captureelements can detect reflected IR.

A representative portion of the first image is determined 908, such asby using one or more algorithms to select a unique or distinctive regionas discussed above. The size of the selected region can be based uponany of a number of factors, and can be increased in some embodimentsuntil the distinctiveness reaches a minimum level. It can be desirablein certain embodiments to minimize the size of the representativeportion, in order to reduce the processing capacity and time needed tolocate a matching portion in the second image. Larger portions canresult in more accurate results, however, so different algorithms canbalance the tradeoff resulting from the size of the portion to be usedfor matching. An algorithm then can attempt to locate a matching portionin the second image 910, such as by starting at a specified location andmoving the representative portion comparison in a directioncorresponding to the offset of the capture elements as discussed above.In some embodiments, a different representative portion can be selectedif no match is found. In other embodiments, another set of images iscaptured to attempt to determine a match (such as where there wasmovement or another occurrence between image captures). Various otherapproaches can be used as well.

Once a match location is determined, the information for the images canbe aligned 912 in order to properly correlate features in the images. Atleast one feature of interest can be located in the aligned images usingimage recognition or another such process or algorithm 914. When usingIR radiation, for example, the process can attempt to locate the pupilsof the user (or any person) captured in the images. When the features ofinterest are located, an algorithm or process can attempt to determinedifferences between the features in the aligned images 916. For example,the process can determine the amount of light reflected (and captured)corresponding to the position of the pupil in one image and compare thatto the corresponding amount of light captured at the position of thepupil in the second image. An algorithm or process then can measure orcalculate at least one aspect with respect to these differences 918,such as the relative separation of a user's pupils, the relativelocation of the pupils with respect to a previously analyzed image, etc.Information about the measured aspects, such as an amount of movement orchange in gaze direction, then can be provided to the computing deviceas input 920. As discussed, the input can be used by the device in anynumber of ways to control any of a number of aspects or functionality ofthe device.

As alluded to above, there can be some inaccuracy built into some ofthese approaches due to the fact that the images being compared may notbe captured simultaneously. For example, in some embodiments a singledetector is used to capture images using light of different wavelengths,IR radiation reflected from different IR emitters, or other such sourcesof reflected radiation. If there is rapid movement during image capture,an offset between images can be difficult to determine, as the positionsof features will not be the same in both images, even taking thestandard image offset into account. For a device attempting to determinegaze direction based on pupil location in a set of images, the resultcan be inaccurate as the gaze direction and/or eye position might bedifferent in each image.

It thus can be desirable in at least some embodiments to capture theimages with as little delay as possible. An approach in accordance withat least one embodiment takes advantage of the fact that many imagecapture elements do not capture an entire image simultaneously, as withconventional film-based cameras, but instead capture an image one scanline at a time. Thus, a digital camera, webcam, or other capture elementhaving a sensor array corresponding to potentially millions of pixelscan capture an image by scanning from a top row (or scan line) of thearray down the array of sensors one row (or scan line) at a time. Itshould be understood that the orientation in which the sensor arrayoperation is described is presented only for convenience of explanation,and that any appropriate orientation, scan direction, or other aspect orapproach can be used as well within the scope of various embodiments.

If the computing device utilizes two radiation sources, such as twoinfrared emitters of substantially the same wavelength at differentpositions on the device or two emitters of different wavelength, forexample, and if the switching speed of those radiation sources issufficient, the radiation sources can be turned on and off such thatevery other scan line captures radiation reflected for one of theradiation sources. For example, FIG. 10 illustrates an example whereinthere are a number of scan lines for an image capture element 1000, andthe radiation captured for each scan line can be alternated betweenlight sources. In some embodiments, a controller can be in communicationwith the capture element and the radiation emitters such that theemitters are switched between scan lines of the capture element.

FIG. 11 illustrates an example of an image 1100 that can be capturedusing such an approach. In this example, the image captures IR lightreflected from the pupil of a user, with a first light source beingretro-reflected by the retina and a second, off-axis light source notbeing reflected to the capture element. As illustrated, a single imagecan essentially capture information for both light sourcessimultaneously, although at a slightly lesser resolution. The ability tocapture the information in a single image significantly reduces theeffects of movement on the position of features imaged using both lightsources. Further, using a single capture element can reduce cost andeliminate parallax effects or distortion on the image(s).

As discussed, the time between capturing images using alternating lightsources can be drastically reduced. For example, a sensor with 600 rowspreviously would have to capture all 600 scan lines of an image for onelight source before switching to capture information for the other lightsource. By switching on each scan line, information for the other lightsource can be captured on the very next scan line, reducing the timebetween information capture to about 1/600 of the previous time.

In some cases, the emitters may not be able to switch at the speedneeded to alternate scan lines for the capture sensor. In oneembodiment, the speed between line captures of the sensor can be slowedenough to enable the switching. In another embodiment, there can be morethan one source used for each type of light (e.g., orthogonal vs.off-axis or different wavelengths) such that each source can beactivated for every fourth or sixth scan line instead of every secondscan line, for example. In yet another embodiment, assuming sufficientresolution of the capture sensor, the light sources can be switchedevery third, fourth, fifth, or six line, etc., instead of every otherscan line. Such an approach can enable the information to be capturedfor two or more light sources in a single image, while still using aconventional capture element and accounting for the switching speed ofthe light sources. Other timing factors can be considered as well, suchas edges (e.g., ramp-up times or tails) of the intensity of the lightfrom a given source, as the source will not have perfect “on” and “off”transitions, or hard edges, but will take a short period of time to turnon and off.

FIGS. 12(a) and 12(b) illustrate another example approach todistinctively capturing light reflected from more than one light sourcein a single image that can be used in accordance with at least oneembodiment. Color filters such as Bayer filters are known in the art forselectively capturing light of a specific color at certain pixels of asensor array, particularly for single-chip digital image sensors.Traditional Bayer filters include red, blue, and green filters (withtwice as many green filters as red and blue filters), such that adjacentsensors will capture the intensity of light of different colors, and thearray as a whole will only capture intensity of light for those threecolors.

Approaches in accordance with various embodiments can utilize adifferent type of filter to selectively capture radiation reflected atdifferent wavelengths. As discussed, a computing device can utilize tworadiation sources, with one source in the range of wavelengths that isreflected by the human retina and another source in the range ofwavelengths that is not reflected by the human retina (or that isabsorbed by the cornea, for example). FIG. 12(a) illustrates an examplefilter 1200 that can be used with such a device. In this example “R” isused to refer to light of a first wavelength range and “G” is used torefer to light of a second wavelength range, but it should be understoodthat these letters are merely selected for convenience and do not inferspecific requirements on the wavelength range of the filter. Further,although a substantially equal distribution of filter elements is shownfor both ranges, it should be understood that the distribution can beuneven as well in other embodiments.

Using such a filter 1200, two radiation sources of differentwavelengths, a single wide-band radiation source, or another such sourceof multiple wavelength radiation can be used to simultaneouslyilluminate the face of a user (or other aspect of an object or elementof interest). Using the filter, a single image can be captured using asingle sensor (e.g., a conventional CCD or CMOS sensor) that willreflect information for both wavelength ranges. For example, FIG. 12(b)illustrates an example image 1210 corresponding to the reflected lightfrom a user's retina that can be captured using such an approach. Asillustrated, adjacent pixels (or groups of pixels) indicate theintensity of light from each of the two wavelength ranges. In thisexample, the first wavelength range that is reflected from the retina isshown by dark areas in the image, while the corresponding secondwavelength range that is not reflected by the retina does not appeardark at those positions in the image. If the resolution of the sensorarray (and filter) is sufficient, this single image can be used tolocate the position, size, and other aspects of a user's pupils (andother such objects).

Although many of the embodiments above provide for aligning images orcapturing images that include distinguishable information for at leasttwo sources, such approaches still can be insufficient in at least someembodiments to provide the level of precision needed to accuratelyprovide input to a device. For example, if the device is tracking gazedirection then the device might need to also know how far away the useris from the device, in order to determine the appropriate anglecorresponding to a lateral shift in position of the user's pupils. Forexample, a user a foot a way from the device will show a much differentchange in pupil position in a captured image than a user three feet awayfrom the device, even though the actual physical amount of movementmight be the same. While aspects such as the separation and size of thepupils can be an indication of distance, variations between users (e.g.,adults versus small children) can affect the precision of suchdeterminations.

Accordingly, it can be desirable in at least some embodiments to alsodetermine the distance to a user captured in the images. In some cases,a relative distance can be determined at least in part by determiningthe apparent size of an object in the image with the known size (or anapproximate size) of the object. For example, as illustrated in theexample 1300 of FIG. 13(a), the distance to an object with height (inthe figure) h will affect how large the object appears in the image. Ata first distance d, the image height (based on the field of view at acurrent level of zoom) will be a height i, and the relative size of theobject in the image will be given by h/i, where in FIG. 13(a) the objecttakes up approximately 50% of the height of the image. As illustrated inFIG. 13(b), as the distance to the object increases to a distance d′,the image height for the field of view at that distance is a largerheight i′, but the height of the object is the same. The apparent heightof the object in the image will decrease, however, as the ratio of h/inow yields a value of approximately 30% of the overall height in theimage. For objects with known height captured with a capture elementwith a known field of view, for example, an algorithm can determine anapproximate distance to that object based on the relative size of theobject in the image.

In many cases, however, the precise size of the object might not beknown. For example, multiple users might utilize the device where eachuser can have features of different sizes. Further, users might altertheir appearance, such as by changing a hair style, growing facial hair,or putting on weight, such that the calculation can be imprecise evenfor a known user.

Several embodiments discussed above capture images of a common object(e.g., a user) from multiple angles. Using parallax-type information, itis possible to get an improved measure of distance by utilizing aparallax analysis of the relative displacement or offset of the objectbetween the images. For example, in FIG. 13(b) the distance from thecenter of the image to the center of the object (or a feature at thefront center of the object) is given by a distance j. FIG. 13(c) showsthe field of view for the second image capture element, separated adistance from the first image capture element. As can be seen, thedistance from the center of the second image to the center of the objectis a different distance, here a distance j′. As should be understood,the directions of the offsets can be the same or opposite in the images.The values of j and j′ will necessarily increase with an increase indistance to the object. Thus, a determination of distance can bemeasured using the offset of a feature position in the two images. Anadvantage to such an approach is that the actual size of the featuredoes not matter as long as a consistent point is determined for thefeature in each image that can be used to determine the offset.

In some cases, a combination of such approaches can be used to improveaccuracy. For example, the information that can be obtained from animage can be limited to at least some extent by the resolution of theimaging element. Thus, combining distance measurement approaches in someembodiments can provide a more precise determination of distance. Forexample, FIG. 13(d) illustrates a first image 1302 and a second image1304 of an object taken at a first distance, captured with respectivefirst and second image capture elements. FIG. 13(e) illustrates the samefirst image 1302 and second image 1304 captured with the object at asecond distance, greater than the first distance. As can be seen, theoverall offset (the sum of j+j′) of the object in FIG. 13(d) is greaterthan the overall offset (the sum of j+j′) of the object in FIG. 13(e).Thus, through proper calibration and analysis the device can make afirst determination of distance based on the relative offset, whichchanges in proportion to the distance to the object. Also as can beseen, the apparent size of the object changes between FIG. 13(d) andFIG. 13(e). In embodiments where the device tracks the object, changesin apparent size also can be indicative of distance to the object. Inembodiments where a user is recognized, such as through facialrecognition or another such process, the apparent size also can be usedto determine an initial distance to the user captured in a first imageor set of images. In some embodiments, both approaches can be used andthe results combined, with or without any weighting. As should beapparent, embodiments can use one or both of these approaches, and/orcan combine one or both of these approaches with at least one othermeasurement approach known for such purposes.

Not all computing devices contain two emitters or detectors (or othersuch devices) positioned a sufficient distance apart on a device todetermine distance using parallax. Still other devices might not relysolely (or at all) upon parallax to determine distance to a user orother object of interest. Accordingly, certain devices can utilize othermechanisms (in addition or alternative to apparent size in capturedimages) to attempt to determine distance.

FIG. 14 illustrates an example configuration 1400 that can be used inaccordance with at least one embodiment, wherein the device includes anultrasonic transceiver (or other such element(s)) capable of emitting asonic pulse and detecting the reflected sonic pulse. As known in theart, since the speed of sound in a standard atmosphere is known within adegree of certainty, the distance to an object can be determined bymeasuring the amount of time needed for the pulse to travel to theobject, be reflected by the object, and travel back to the ultrasonicdevice. As illustrated in FIG. 14, if the time it takes for atransmitted ultrasonic wave to reach the face of a user is t₁, and thetime it takes for the reflected ultrasonic wave to arrive back at thedevice is t₂, then the distance to the object can be determined as afunction of the sum of those times, or f(t₁+t₂). Approaches fordetermining distance based on the time of travel of a reflected wave arewell known in the art and will not be discussed in detail herein.

Such an approach still may not provide the desired level of precision inall cases, however, as there is a period of time needed for theultrasonic wave to travel to the object and back, and any significantrelative movement of the user (or other object of interest) during thattime can affect the accuracy of the distance determination. FIGS. 15(a)and 15(b) illustrate an example approach that can be used in accordancewith at least one other embodiment. The image capture components ofcertain computing devices can contain automated focusing optics whichcan adjust an effective focal length of the image capture component inorder to focus on the object of interest. In the example configuration1500 of FIG. 15(a), the effective focal length f is shown to be tooshort, such that an object at a distance d will likely not be in focus,or will be at least somewhat out of focus. In the configuration 1502 ofFIG. 15(b), the optical elements have been adjusted such that the focallength f of the image capture element substantially equals the distanced to the object of interest, such that the object is substantially infocus. In addition to ensuring that the object is in focus, theadjustment in effective focal length also can provide a measure of thedistance to the object of interest, as in this case f=d.

Thus, through careful calibration (and possibly periodic recalibration)of the imaging optics, an algorithm or process can determine theapproximate distance to an object based at least in part on theeffective focal length. In some embodiments, an ambient camera might beused to focus on the user (and potentially provide other informationsuch as user identity), and an infrared configuration might be used todetect gaze direction. Various other approaches can be used as well asdiscussed elsewhere herein. An advantage to such an approach is that thedetermination of distance and the capture of an image can besubstantially simultaneous, such that movement of the user will notsignificantly impact the measurements. In some embodiments the focuswill automatically adjust and track the position of the user, such thatthe position will be substantially accurate as long as the user does notmove faster than the focusing optics can adjust. In some embodiments,the device can determine when an image was captured while a user wasmoving or otherwise out of focus, and that image can be discarded and/ora new image captured when the user is back in focus. Other methods fortracking and determining accuracy can be used as well within the scopeof the various embodiments.

A number of other approaches can be used as well within the scope of thevarious embodiments. For example, thermal imaging or another suchapproach could be used to attempt to determine and track the position ofat least some aspect of a human user. In many instances the imagingsystem is desired to be small and cheap enough for mass marketing, suchthat simple or conventional imaging approaches and components can bepreferred. Certain existing cameras can detect infrared radiation, buttypically utilize an IR filter. Utilizing these cameras without the IRfilter, and potentially with an ambient light filter, can allow theserelatively inexpensive cameras to be used as IR detectors.

Other conventional elements can be used to reduce the cost of acomputing device able to perform approaches discussed herein, but mightbe less accurate and/or might require a larger device. For example,images can be split using beam splitters (e.g., silvered mirrors) suchthat half of the reflected light gets reflected to a different location(e.g., part of a sensor). Similarly, various optical elements such as anoptical interferometer can be used to attempt to obtain accuratedistance measurements.

As discussed with any optical approach, it can be desirable to performat least an initial calibration procedure, as well as potentiallyadditional and/or periodic recalibration. In one embodiment where twocameras are used, it can be advantageous to periodically capture imagesof a grid or similar pattern in order to calibrate for bends or physicalchanges in the optics. In some embodiments where an initial calibrationis performed during the manufacturing process, the user might only needto have the device recalibrated when performance begins to degrade, orat any other appropriate time.

A computing device used for such purposes can operate in any appropriateenvironment for any appropriate purpose known in the art or subsequentlydeveloped. Further, various approaches discussed herein can beimplemented in various environments for various applications or uses.For example, FIG. 16 illustrates an example of an environment 1600 forimplementing aspects in accordance with various embodiments. As will beappreciated, although a Web-based environment is used for purposes ofexplanation, different environments may be used, as appropriate, toimplement various embodiments. The environment 1600 shown includes avariety of electronic client devices 1602, which can include anyappropriate device operable to send and receive requests, messages, orinformation over an appropriate network 1604 and convey information backto a user of the device. Examples of such client devices includepersonal computers, cell phones, handheld messaging devices, laptopcomputers, set-top boxes, personal data assistants, electronic bookreaders, and the like. Each client device can be capable of running atleast one motion or orientation-controlled interface as discussed orsuggested herein. In some cases, all the functionality for the interfacewill be generated on the device. In other embodiments, at least some ofthe functionality or content will be generated in response toinstructions or information received from over at least one network1604.

The network 1604 can include any appropriate network, including anintranet, the Internet, a cellular network, a local area network, or anyother such network or combination thereof. Components used for such asystem can depend at least in part upon the type of network and/orenvironment selected. Protocols and components for communicating viasuch a network are well known and will not be discussed herein indetail. Communication over the network can be enabled by wired orwireless connections, and combinations thereof. In this example, thenetwork includes the Internet, as the environment includes a primarycontent provider 1606 and a supplemental content provider 1608. Eachprovider can include at least one Web server 1606 for receiving requestsfrom a user device 1602 and serving content in response thereto,although for other networks an alternative device serving a similarpurpose could be used as would be apparent to one of ordinary skill inthe art.

Each content provider in this illustrative environment includes at leastone application server 1612, 1614, 1622 or other such server incommunication with at least one data store 1616, 1618, 1624. It shouldbe understood that there can be several application servers, layers,and/or other elements, processes, or components, which may be chained orotherwise configured, which can interact to perform tasks such asobtaining data from an appropriate data store. As used herein the term“data store” refers to any device or combination of devices capable ofstoring, accessing, and retrieving data, which may include anycombination and number of data servers, databases, data storage devices,and data storage media, in any standard, distributed, or clusteredenvironment. An application server can include any appropriate hardwareand software for integrating with the data store as needed to executeaspects of one or more applications for the client device, handling amajority of the data access and business logic for an application. Theapplication server provides access control services in cooperation withthe data store, and is able to generate content such as text, graphics,audio, and/or video to be transferred to the user, which may be servedto the user by the Web server in the form of HTML, XML, or anotherappropriate structured language in this example. The handling of allrequests and responses, as well as the delivery of content between theclient device 1602 and an application server, can be handled by therespective Web server. It should be understood that the Web andapplication servers are not required and are merely example components,as structured code discussed herein can be executed on any appropriatedevice or host machine as discussed elsewhere herein. Further, theenvironment can be architected in such a way that a test automationframework can be provided as a service to which a user or applicationcan subscribe. A test automation framework can be provided as animplementation of any of the various testing patterns discussed herein,although various other implementations can be used as well, as discussedor suggested herein.

Each data store can include several separate data tables, databases, orother data storage mechanisms and media for storing data relating to aparticular aspect. For example, the page data store 1616 illustratedincludes mechanisms for storing page data useful for generating Webpages and the user information data store 1618 includes informationuseful for selecting and/or customizing the Web pages for the user. Itshould be understood that there can be many other aspects that may needto be stored in a data store, such as access right information, whichcan be stored in any of the above listed mechanisms as appropriate or inadditional mechanisms in the data store. Each data store is operable,through logic associated therewith, to receive instructions from arespective application server and obtain, update, or otherwise processdata in response thereto. In one example, a user might submit a searchrequest for a certain type of content. In this case, the data storemight access the user information to verify the identity of the user,and can access the content information to obtain information aboutinstances of that type of content. The information then can be returnedto the user, such as in a results listing on a Web page that the user isable to view via a browser on the user device 1602. Information for aparticular instance of content can be viewed in a dedicated page orwindow of the browser.

Each server typically will include an operating system that providesexecutable program instructions for the general administration andoperation of that server, and typically will include a computer-readablemedium storing instructions that, when executed by a processor of theserver, allow the server to perform its intended functions. Suitableimplementations for the operating system and general functionality ofthe servers are known or commercially available, and are readilyimplemented by persons having ordinary skill in the art, particularly inlight of the disclosure herein.

The environment in one embodiment is a distributed computing environmentutilizing several computer systems and components that areinterconnected via communication links, using one or more computernetworks or direct connections. However, it will be appreciated by thoseof ordinary skill in the art that such a system could operate equallywell in a system having fewer or a greater number of components than areillustrated in FIG. 16. Thus, the depiction of the system 1600 in FIG.16 should be taken as being illustrative in nature, and not limiting tothe scope of the disclosure.

Various embodiments discussed or suggested herein can be implemented ina wide variety of operating environments, which in some cases caninclude one or more user computers, computing devices, or processingdevices which can be used to operate any of a number of applications.User or client devices can include any of a number of general purposepersonal computers, such as desktop or laptop computers running astandard operating system, as well as cellular, wireless, and handhelddevices running mobile software and capable of supporting a number ofnetworking and messaging protocols. Such a system also can include anumber of workstations running any of a variety ofcommercially-available operating systems and other known applicationsfor purposes such as development and database management. These devicesalso can include other electronic devices, such as dummy terminals,thin-clients, gaming systems, and other devices capable of communicatingvia a network.

Most embodiments utilize at least one network that would be familiar tothose skilled in the art for supporting communications using any of avariety of commercially-available protocols, such as TCP/IP, OSI, FTP,UPnP, NFS, CIFS, and AppleTalk. The network can be, for example, a localarea network, a wide-area network, a virtual private network, theInternet, an intranet, an extranet, a public switched telephone network,an infrared network, a wireless network, and any combination thereof.

In embodiments utilizing a Web server, the Web server can run any of avariety of server or mid-tier applications, including HTTP servers, FTPservers, CGI servers, data servers, Java servers, and businessapplication servers. The server(s) also may be capable of executingprograms or scripts in response requests from user devices, such as byexecuting one or more Web applications that may be implemented as one ormore scripts or programs written in any programming language, such asJava®, C, C# or C++, or any scripting language, such as Perl, Python, orTCL, as well as combinations thereof. The server(s) may also includedatabase servers, including without limitation those commerciallyavailable from Oracle®, Microsoft®, Sybase®, and IBM®.

The environment can include a variety of data stores and other memoryand storage media as discussed above. These can reside in a variety oflocations, such as on a storage medium local to (and/or resident in) oneor more of the computers or remote from any or all of the computersacross the network. In a particular set of embodiments, the informationmay reside in a storage-area network (“SAN”) familiar to those skilledin the art. Similarly, any necessary files for performing the functionsattributed to the computers, servers, or other network devices may bestored locally and/or remotely, as appropriate. Where a system includescomputerized devices, each such device can include hardware elementsthat may be electrically coupled via a bus, the elements including, forexample, at least one central processing unit (CPU), at least one inputdevice (e.g., a mouse, keyboard, controller, touch screen, or keypad),and at least one output device (e.g., a display device, printer, orspeaker). Such a system may also include one or more storage devices,such as disk drives, optical storage devices, and solid-state storagedevices such as random access memory (“RAM”) or read-only memory(“ROM”), as well as removable media devices, memory cards, flash cards,etc.

Such devices also can include a computer-readable storage media reader,a communications device (e.g., a modem, a network card (wireless orwired), an infrared communication device, etc.), and working memory asdescribed above. The computer-readable storage media reader can beconnected with, or configured to receive, a computer-readable storagemedium, representing remote, local, fixed, and/or removable storagedevices as well as storage media for temporarily and/or more permanentlycontaining, storing, transmitting, and retrieving computer-readableinformation. The system and various devices also typically will includea number of software applications, modules, services, or other elementslocated within at least one working memory device, including anoperating system and application programs, such as a client applicationor Web browser. It should be appreciated that alternate embodiments mayhave numerous variations from that described above. For example,customized hardware might also be used and/or particular elements mightbe implemented in hardware, software (including portable software, suchas applets), or both. Further, connection to other computing devicessuch as network input/output devices may be employed.

Storage media and computer readable media for containing code, orportions of code, can include any appropriate media known or used in theart, including storage media and communication media, such as but notlimited to volatile and non-volatile, removable and non-removable mediaimplemented in any method or technology for storage and/or transmissionof information such as computer readable instructions, data structures,program modules, or other data, including RAM, ROM, EEPROM, flash memoryor other memory technology, CD-ROM, digital versatile disk (DVD) orother optical storage, magnetic cassettes, magnetic tape, magnetic diskstorage or other magnetic storage devices, or any other medium which canbe used to store the desired information and which can be accessed by asystem device. Based on the disclosure and teachings provided herein, aperson of ordinary skill in the art will appreciate other ways and/ormethods to implement the various embodiments.

The specification and drawings are, accordingly, to be regarded in anillustrative rather than a restrictive sense. It will, however, beevident that various modifications and changes may be made thereuntowithout departing from the broader spirit and scope of the invention asset forth in the claims.

What is claimed is:
 1. A computer-implemented method of enabling a user to provide a control input to an electronic device, comprising: under control of one or more computing systems configured with executable instructions, illuminating, by a first infrared (IR) source of the electronic device, at least a portion of a face; illuminating, by a second IR source of the electronic device, at least the portion of the face; determining that a third IR sensor is covered, based at least in part on the third IR sensor not receiving reflected IR light; capturing, by a first IR sensor at a first time, a first image including reflected IR light from the first IR source; capturing, by a second IR sensor at the first time, a second image including reflected IR light from the second IR source; determining a first portion of the first image, the first portion including facial features that meet a minimum level of distinctiveness; locating a second portion of the second image, the second portion of the second image corresponding to the first portion of the first image; aligning the first portion of the first image and the second portion of the second image, based at least in part on first information; determining a first position of the face based at least on the first image; determining a second position of the face based at least on the second image; determining a relative position of the face based at least on a difference between the first position of the face and the second position of the face; and determining a control input for the electronic device based on the relative position of the face.
 2. The computer-implemented method of claim 1, wherein the first information includes information relating to at least a color, a brightness, a hue, a light level, one or more coordinates, or a pixel.
 3. The computer-implemented method of claim 1, wherein aligning the first portion of the first image and the second portion of the second image further comprises: identifying a first target portion in the second image based at least in part on one or more coordinates of the first portion of the first image; determining that a first match score between the first portion of the first image and the first target portion in the second image is lower than a threshold match score; identifying a second target portion in the second image; determining that a second match score between the first portion of the first image and the second target portion in the second image is above the threshold match score; and using the second target portion in the second image as the second portion of the second image.
 4. The computer-implemented method of claim 1, wherein determining the first portion of the first image further comprises: determining a first size of the first portion of the first image; determining that the facial features included in the first portion of the first image do not meet the minimum level of distinctiveness; determining a second portion of the first image, the second portion having a second size, the second size larger than the first size; determining that the facial features included in the second portion of the first image meet the minimum level of distinctiveness; and using the second portion of the first image in place of the first portion of the first image.
 5. The computer-implemented method of claim 1, further comprising determining an approximate distance between the electronic device and the portion of the face based at least on the first portion of the first image and the second portion of the second image.
 6. The computer-implemented method of claim 1, further comprising: illuminating, by a third IR source of the electronic device, at least the portion of the face; and capturing, by a fourth IR sensor, a third image including reflected IR light from the third IR source.
 7. The computer-implemented method of claim 1, further comprising: capturing, by a camera, an ambient light image including at least the portion of the face; determining an approximate head position of the face with respect to the electronic device based at least in part upon the ambient light image; and determining the relative position of the face based at least on the ambient light image, the first portion of the first image and the second portion of the second image.
 8. The computer-implemented method of claim 1, further comprising: illuminating, by a third IR source of the electronic device, at least the portion of the face; and capturing, by the second IR sensor, a third image including reflected IR light from the third IR source.
 9. The computer-implemented method of claim 1, further comprising: illuminating, by the first IR source, radiation within a first range of wavelengths capable of being substantially reflected by areas of the face; and illuminating, by the second IR source, radiation with a second range of wavelengths capable of being substantially absorbed by the areas of the face.
 10. A computing device, comprising: a first infrared (IR) source; a second IR source; a first IR sensor; a second IR sensor; a third IR sensor; at least one processor; memory including non-transitory instructions that, when executed by the processor, cause the computing device to: illuminate, by the first IR source, at least a portion of a face; determine that the third IR sensor is not receiving reflected IR light; determine, based at least on the third IR sensor not receiving reflected IR light that the third IR sensor is covered; illuminate, by the second IR source, at least the portion of the face; capture, by the first IR sensor at a first time, a first image including reflected IR light from the first IR source; capture, by the second IR sensor at the first time, a second image including reflected IR from the second IR source; determine a first portion of the first image, the first portion including facial features that meet a minimum level of distinctiveness; locate a second portion of the second image, the second portion of the second image corresponding to the first portion of the first image; align the first portion of the first image and the second portion of the second image, based at least in part on first information; determine a first position of the face based at least on the first image; determine a second position of the face based at least on the second image; determine a relative position of the face based at least on a difference between the first position of the face and the second position of the face; and determine a control input for the computing device based on the relative position of the face.
 11. The computing device of claim 10, wherein the first information includes information relating to at least a color, a brightness, a hue, a light level, one or more coordinates, or a pixel.
 12. The computing device of claim 10, wherein the memory further includes non-transitory instructions that cause the computing device to: identify a first target portion in the second image based at least in part on one or more coordinates of the first portion of the first image; determine that a first match score between the first portion of the first image and the first target portion in the second image is lower than a threshold match score; identify a second target portion in the second image; determine that a second match score between the first portion of the first image and the second target portion in the second image is above the threshold match score; and use the second target portion in the second image as the second portion of the second image.
 13. The computing device of claim 10, wherein the memory further includes non-transitory instructions that cause the computing device to: determine a first size of the first portion of the first image; determine that the facial features included in the first portion of the first image do not meet the minimum level of distinctiveness; determine a second portion of the first image, the second portion having a second size, the second size larger than the first size; determine that the facial features included in the second portion of the first image meet the minimum level of distinctiveness; and use the second portion of the first image in place of the first portion of the first image.
 14. The computing device of claim 10, wherein the first IR sensor is positioned substantially adjacent to the first IR source, and the second IR sensor is positioned substantially adjacent to the second IR source.
 15. The computing device of claim 10, wherein the first IR source emits radiation within a first range of wavelengths capable of being substantially reflected by areas of the face, and wherein the second IR source emits radiation within a second range of wavelengths capable of being substantially absorbed by the areas of the face.
 16. The computing device of claim 10, further comprising: a third IR source, wherein the memory further includes non-transitory instructions that cause the computing device to: illuminate, by the third IR source, at least the portion of the face; and capture a third image using the first IR sensor, the third image including reflected IR from the third IR source.
 17. The computing device of claim 10, wherein the memory further includes non-transitory instructions that cause the computing device to: determine an approximate distance between the computing device and the portion of the face based at least on the first portion of the first image and the second portion of the second image.
 18. The computing device of claim 10, further comprising: a third IR source, wherein the memory further includes non-transitory instructions that cause the computing device to: determine that the third IR sensor is not receiving reflected IR light from the third IR source.
 19. The computing device of claim 10, further comprising: a camera, wherein the memory further includes non-transitory instructions that cause the computing device to: capture, using the camera, at least one ambient light image including at least the portion of the face; and determine an approximate head position of the face with respect to the computing device based at least in part upon the at least one ambient light image. 